Language Modeling and Document Re-Ranking: Trinity Experiments at TEL@CLEF-2009
نویسندگان
چکیده
This paper presents a report on our participation in the CLEF-2009 monolingual and bilingual ad hoc TEL@CLEF tasks involving three different languages: English, French and German. Language modeling is adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particular important when estimating a language model. The main purpose of the monolingual task is to compare different smoothing strategies and investigate the effectiveness of each alternative. This retrieval model is then used alongside a document re-ranking method based on Latent Dirichlet Allocation (LDA) which exploits the implicit structure of the documents with respect to original queries for the monolingual and bilingual tasks. Experimental results demonstrated that three smoothing strategies behave differently across testing languages while LDA-based document re-ranking method should be considered further in order to bring significant improvement over the baseline language modeling systems in the cross-language setting.
منابع مشابه
TCD-DCU at TEL@CLEF 2009: Document Expansion, Query Translation and Language Modeling
For the multilingual ad-hoc document retrieval track (TEL@CLEF) at at the Cross-Language Retrieval Forum (CLEF) Trinity College Dublin and Dublin City University participated in collaboration. Our retrieval experiments focus on i) investigating document expansion using an entry vocabulary module, ii) translating queries with Google translate and a statistical MT system, and iii) investigating l...
متن کاملSmoothing Methods and Cross-Language Document Re-ranking
This paper presents a report on our participation in the CLEF 2009 monolingual and bilingual ad hoc TEL@CLEF task involving three different languages: English, French and German. Language modeling was adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particularly important when estimating a language model. The main purpose of the ...
متن کاملDocument Expansion, Query Translation and Language Modeling for Ad-Hoc IR
For the multilingual ad-hoc document retrieval track (TEL) at CLEF, Trinity College Dublin and Dublin City University participated in collaboration. Our retrieval experiments focused on i) document expansion using an entry vocabulary module, ii) query translation with Google translate and a statistical MT system, and iii) a comparison of the retrieval models BM25 and language modeling (LM). The...
متن کاملDCU-TCD@LogCLEF 2010: Re-ranking Document Collections and Query Performance Estimation
This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further re...
متن کاملExperiments with N-Gram Prefixes on a Multinomial Language Model versus Lucene's Off-the-shelf Ranking Scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task)
We describe our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track, where we measured the retrieval performance of LGTE, an index engine for Geo-Temporal collection which is mostly based on Lucene, together with extensions for query expansion and multinomial language modelling. We experiment an N-Gram stemming model to improve our last year experiments which consisted in combinati...
متن کامل